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ABSTRACT 


There are more than 68 muscles, which are activated either simultaneously or 
sequentially during speech production. To monitor the signals from all these 
muscles at once, involve a lot of sensors and such system is very expensive. 
In the Quran therapeutic treatment applications, the use of specific muscles is 
very important, for the production of correct Arabic pronunciation. 


The proper pronunciation will improve the reader's understanding of what is 
being read, thus assisting the effectiveness of the therapy process. 
The objective of this study is to identify the most optimal muscle location, 
which is suitable for monitoring the quality of a recitation during the Quran’s 
therapeutic process, based on the information content embedded in their 
Electromyogram (EMG) signals. Empirical Mode Decomposition (EMD) 
technique was used in this study to extract features of the EMG while the 
combination of Hilbert Huang Spectral Entropy (HHSE) and Kullback 
Leibler Divergence (KLD) techniques were used to quantify the information 
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entropy content in each feature. Combination of these techniques managed to rank 
ten widely used speech muscles in the literature based upon their information 
content. Four muscle locations have been suggested, which is believed to be 
sufficient in developing a low-cost self-assessment system for monitoring 
Quran recitation. 
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1, INTRODUCTION 

The Quran therapy is a psychological therapy, adopted by Muslims as a reliable alternative 
medicine [1]. It involves the recitation of Ruqyah (specific verses in the Quran) as a form of meditation to 
reduce stress and depression [2]. Ruqyah should be recited correctly, for that recitation (in Arabic) be 
translated into its true meaning, thus maximizing the therapy process for a patient [3-4]. Developing a Quran 
therapeutic system, which can automatically measure the quality of a reading, directly helps the patient 
rectifying its pronunciation. 

Automatic Speech Recognition (ASR) specifically in Arabic has been extensively studied [5-6]. 
ASR works by translating a speech signal in the form of their contextual information. Ambient noise and 
user's speech impairment are two main challenges in developing a high-precision ASR system [7]. 
Nevertheless, in the Quran therapy applications, the desired system must not only be measured by its ability 
to recognize speech, but also the capability to assess the extent of the speech generated using the right 
anatomy parts of the human body. This led to the use of Electromyogram (EMG) as an alternative source for 
analyzing the recitation of Ruqyah, during the therapy. 
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Using EMG to recognize speech is not a new approach in ASR [8-9]. Researchers tend to use the 
term Silent Speech Recognition (SSR) when the acoustic speech signal being replaced by EMG. 
Several areas have been widely studied, which include: 

a) Specific applications; like in prosthesis control [10], oral motor treatment for speech production [11] 
and improve communication while subject wears a self-contained breathing apparatus (SCBA) [12]. 

b) Feature extraction and classification techniques [13-16] 

c) Language or Dialect variation [17-19] 

d) Speech synthesis [20-22] 

This study will look into another equally important factor in EMG-based speech recognition 
technology, namely the impact of muscles’ selection towards the quality of speech. The speech production 
begins in the lungs as illustrated in Ture 1. Air which is exhaled by the lungs travels up through the trachea 
and into the larynx. 
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Figure 1. Speech production system 


Within the larynx, there are two folds of ligament known as vocal cords. The gap between these 
vocal cords is called the glottis. During speech production, the vocal cords vibrate rapidly, which opens and 
closes the glottis. The continuous streams of air from the lungs are trimmed into periodic suction of air, 
and eventually generate an audible but in monotonous sound. The pharynx (throat), the oral cavity (the 
mouth) and the nasal cavity (the nose) then change the frequency components of the monotonous sound, 
which later produce different types of vowel sounds. The process of articulation will eventually reshape these 
sounds into a form that can be recognized by the listener, namely the speech. The articulators involves in 
reshaping the sounds include the lips, teeth, hard and soft palate, tongue, jaw, posterior pharyngeal wall and 
the inner edges of the vocal folds [23]. 

There are more than 68 muscles involved during speech production [24]. The frequencies and 
sequences of these muscles being activated are depending upon the type of pronunciation produced. 
To monitor all the muscles at the same time requires a large number of sensors. It is also not practical 
because of the cost for such systems is high. 

From Figure 1, the production of speech pronunciation utilized most of the muscle located near the 
facial and neck. We summarized the muscle locations used for EMG based speech recognition in most 
literature as illustrated in Figure 2 [8-22]. 
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Figure 2. 10 widely used EMG location for speech recognition 


Indonesian J Elec Eng & Comp Sci, Vol. 17, No. 2, February 2020: 957 - 967 


Indonesian J Elec Eng & Comp Sci ISSN: 2502-4752 O 959 


The description of locations for El to E10 in Figure 2 is given as follows: 
(E1) lcm to the right of the neck midline. 
(E2) 1cm to the left of the neck midline. 
(E3) Centered sternocleidomastoid at one-third from clavicle to mastoid for electrode. 
(E4) lcm lateral to the ventral neck midline. 
(ES) 1cm lateral to submental midline. 
(E6) Below right corner of the mouth. 
(E7) Upper right corner of the mouth. 
(E8) Cantered on lateral jaw superficial to masseter muscle. 
(E9) 4cm lateral to submental midline. 
(E10) Between the left corner of the mouth and eye. 
In this paper, we assess the signals from these locations in an attempt to identify the best locations 
that produce the highest context information, for speech recognition. Given that the level of muscle activity 
depends on the type of pronunciation, this study focused on muscle activity during Ruqyah recitations. 


2. EMG MODELLING USING EMPIRICAL MODE DECOMPOSITION (EMD) 

EMG signals from 10 muscle locations described in Section 1 will be mathematically modeled, 
measured and compared to assess the quality of their contextual information. In this work, Empirical Mode 
Decomposition (EMD) [25] will be used to model all signals while Total Hilbert-Huang Spectral Entropy 
H, [26-27] and Kullback—Leibler Divergence (Dx,) [28] will be used for measuring the quality of information 
from these signals. The mathematical background of these techniques will be explained in Section 3. 

Empirical Mode Decomposition (EMD) technique assumes that for any given data (or signal), at any 
given time, the signal can be decomposed into various coexisting simple natural oscillation called Intrinsic 
Mode Function (IMF), superimposing on the others. A cycle of the simple oscillating signal must have an 
equal number of maxima and minima peaks. However, its amplitude and frequency may vary as 
functions of time [29]. 

Consider an arbitrary discrete time signal y(t), for the length of T, to be decomposed using EMD as 
shown in Figure 3. First, detect all local maxima, L,,.,,[n] of y(t). Then connect these Lyyg,[n] by a cubic 
spline line to generate upper envelop, Y,,,,(t) for all t. Repeat the similar process for every local minima, 
Lmin|N] to generate lower envelop, y;,(t) for all. The total number of local extrema (maxima and minima), 
n should be the same or differ at most by one. 


Li |") Vax (7 ) 


y(t) Lin [7] Y Min (7) 


Figure 3. Empirical mode decomposition process of an arbitrary signal 


The average signal, m(t) between y,,,,(t) and Yj, (t) is calculated as 


Max ' Min ' 

m(t)= Yas (*)* Yon (1) (1) 
2 

Using (1), obtain the difference, d(t) between y(t) and m(t) using 

d(t)= y(t)—m(t) (2) 


m(t) is set to be the j-IMF, c;(t) while d(t) is defined as the k-residue, 7, (t) if d(t) satisfy one of 
the following criteria [30]: 
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a) m(t) approaches zero 
b) The number of local extrema and the number of zero crossing of d(t) differ at most one 
c) User defined maximum iteration is reached 

If d(t) failed to meet either of the above criteria, y(t) is replaced by d(t). Overall decomposition 
process is repeated until c;(t) is identified. This process is repeated for j=1 until 7,(¢) is either a constant, 
a monotonic slope, or a function with only one extremum [31]. Finally, y(t) can be represented as 


y(t)= Ye, (t)+7, (t) (3) 


The quality or relevance of each c;(t) can be determined by their correlation coefficient, y; between 
c(t) and y(t) as follows: 


y= —pOcWa | (4) 


Jf y (t)dt|c’, (t) dt 


Various types of features can be extracted from (3). The easiest features 1s the IMF energy, P; which 
is defined as [32]: 


(5) 





Hilbert Transform (HT) can be used, to transform all c;(t) in (3) into their HT form, h,(¢). 
Combination of c;(t) and h,(t) generates an analytical signal, z(t) which can be expressed as 


<,(t) =¢, (t) +ih, (1) =a, (el 6) 


From (6), instantaneous amplitude, a;(t) and instantaneous frequency f;(t) can respectively 
calculated as follows: 





a,(t)= yey ()+hy (1) (7) 
i= Lig, (*)-¢,(*-1)) (8) 


| af 2 (¢) 
Where f,=sampling frequency and @, (rt) = tan EG) 
C.(l 
J 
Using (7) and (8), average amplitude, a, and average frequency, i ; can be set as features. 


They can be respectively expressed as 


ae == >a,(1) ”) 


Ya ()o,(1) 
{=< (10) 
Soa; (r) 


t=0 
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The frequency-time distribution of the amplitude for z;(t), designated as the Hilbert amplitude 
spectrum, H;(w,t) now can be obtained. Two types of features can be extracted from H;(a,t), namely 
marginal spectrum, h;(w) and instantaneous energy density level, /E;(t). They can respectively be expressed 
as follows: 


h,(o) == fH, (a1) dt (11) 


IE, (t)= | H2(at) do (12) 


The objective of this work is to inspect the quality of EMG signal, generated from various muscle 
locations, that is best to represent the recitation of Ruqyah. A signal is said to have more information, if the 
signal is closer to random and embeds a large amount of uncertainty. Measuring the uncertainty of a signal 
reflects the quality of that signal. Entropy is a measurement of randomness and represents the amount of 
energy carried by a signal. By applying the Shannon entropy concept to the H;(w,t), a new type of entropy 


was obtained and denoted as Hilbert-Huang spectral entropy, /; ( f ). To simplify the representation, 


the h;(w) is written as a function of frequency (f) instead of angle frequency (w) and normalized as 
follows [33]: 


B(f)=-— (13) 


Then, the total entropy H; can be calculated as follows: 
= = LAY log (#41) (14) 


A higher value of H; represents a higher uncertainty of a signal. 
Let Py; and Q,; represent two different probability distribution of H;. The relative entropy of 
Py; with respect to Q;,; can be defined as 


Dg ( PMO, =P i log (15) 


D,, 2 | O,,) is also known as Kullback—Leibler divergence from Q,; and P,; and measure the 


amount of information lost when Q;,; 1s used to approximate P,,;. 


3. METHODOLOGY 

In Islamic complementary medicine, there are many surah (or chapters) and verses from the Quran 
that can be used as Ruqyah. Among them is verses 1 to 7 from surah ‘Al-Fatihah’, verses 1 to 5 from surah 
‘Al-Baqarah’ and verses 18 to 19 from surah ‘Al-Imran’. Some of these verses are too long, thus in our 
experiment, some verses are divided into smaller recordings. As a result, 17 short verses were generated, 
to be used as Ruqyah and recited by all participants. 

In this study, a total of seven participants were selected among university students, aged between 18 
and 25 years old. These participants are required to have the ability to read the verses of the Quran properly 
and smoothly. To ensure that these conditions are met, participants are required to read these verses to 
clerics. After the clerics satisfied with the recitation, the recording process began. 

Two physiological signals namely, electromyogram (EMG) and speech are recorded. The speech 
signal is recorded using a built-in microphone of a Dell LATITUDE-E6540 laptop. Delsys Bagnoli 4-channel 
is used to record the EMG signals [34]. 
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All participants will recite the selected Ruqyah while their speech and EMG are recorded. 
Four EMG locations are recorded simultaneously at a time. Each verse is recorded for five times. To ensure 
that electrode variabilities are also taken into consideration, each muscle location in Figure 2 will be 
repeatedly recorded using all four Delsys Bagnoli electrodes. By the end of the recording process, total of 
2380 EMG and 595 speech recordings were created. 

All signals then undergo pre-processing steps. First, the DC components of the EMG and Speech 
signals are removed. A 50Hz notch filter 1s applied to both signals to ensure that they are free from the 
power-line interference. All EMG signals are then examined in the time domain to identify any outliers. 
In this work, the boundary between peaks of each signal is set to +3 times standard deviation of overall 
peak-to-peak amplitudes. Any amplitudes beyond these boundaries are considered as outliers and removed. 
Next, bandpass filter is applied on them. The bandpass ranges for both EMG and speech signals, are from 10 
to 500 Hz and from 50 to 3000 Hz respectively. Both signals are then resampled to 8000 Hz. Finally, 
all signals are normalized so that the amplitude between the peaks is equal to | [35]. All clean signals are 
then processed for the measurement of the H;, calculated using (14) as described in the previous section. 


4. RESULTS AND DISCUSSION 

Using (14), there are two types of measurement indicators that can be used to identify the best 
muscle location during the recitation of Ruqyah. The first indicator is the amount of information embedded in 
an EMG signal. In representing the recitation of Ruqyah, a muscle is a better selection than the others, if it 
generates an EMG signal which embeds higher information, in this case, higher entropy value of H;. 

When an EMG signal being decomposed into k-IMF signals, it generates k number of entropy, H;. 
Some IMF embeds more information compared with the others. Let the total information, TH generated from 
a muscle to be the sum of entropy 4H; of all k-IMF calculated from the decomposed EMG signal. 
Mathematically, it is describe as follows: 


(16) 


Figure 4 illustrates the normal distribution of total information, TH calculated using (16) for 10 
muscle locations. By observing Figure 4, evaluation from a total of 2380 EMG recordings indicates that each 
muscle location generates its own normal distribution. The mean, w and variance a’, for the TH 
distributions illustrated in Figure 4, for each muscle location is given in Table 1. 


Table 1. Mean and Variance for the TH Distributions of Various Muscle Locations 
Location Lt 2 Location Lt 2 


oO oO 
El 5.33 0.14 E6 5.06 0.08 
E2 5.35 0.10 E7 5.40 0.05 
E3 312. 0.13 E8 5.12 0.04 
E4 5.14 0.09 E9 5.45 0.11 
E5 5:32. «0.14 E10 5.38 0.07 


From Table 1, the sequence from the highest p to the lowest, for each muscle location 1s given as 
E9, E7, E10, E2, El, E5, E4, E8, E3 and E6. In terms of consistency (the capability to embed similar 
information from specific muscles), the sequence from the lowest o*to the highest is given as E8, E7, E10, 
E6, E4, E2, E9, E3, El and ES. 

By assuming, the accuracy for the normal distribution of Figure 4, is within 60 of each 
distribution, the highest value for each distribution is shown in Figure 5. The sequence from the highest 
maximum values to the lowest is given as 6.45(E9), 6.44 (E1), 6.43 (E5), 6.30 (E2), 6.22 (E3), 6.17 (E10), 
6.10 (E7), 6.04 (E4), 5.91 (E6) and 5.72(E8). 
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The second indicator to measure the quality of the EMG signals generated during the recitation of 
Ruqyah is the difference between total information embedded in the EMG, THgy, and total information 
embedded in the speech, TH,,,. In ideal case, TH,,,, must be equal to the sum of all THgyg, calculated from 
all muscles involved in generating the speech. Thus, the best muscle representing the recitation of Ruqyah, 
generates similar information between the speech and EMG. In our case, muscle which generates the smallest 
information difference, ATH is better than the other and can be described as follows: 


ATH =TH,,, —TH ey (17) 


Figure 6 illustrates the normal distribution of ATH for all 10 muscle locations. The mean of each 
distribution taken from 2380 EMG recordings 1s given in Table 2. 
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Figure 6. Different of information distribution between speech and EMG distribution on various muscles 
location 


Based on the mean value given in Table 2, it shows that E9 generates the smallest ATH . This 
follows with E7, E2, E10, E5, El, E4, E3, E8 and E6. From Table 2 and Figure 6, it is also observed that E5 


has the highest probability to generate 7H,,,,,, equal toTH wer This follows with E2, El, E9, E4, E10, E7, 
E3, E6 and E8. 


Table 2. Mean (uz ) of ATH and Normal Pdf when ATH =O for Various Muscle Locations 
Location ye) ATH =O Location ATH =O 


El 0.93 0.1678 E6 1.21 0.0505 
E2 0.90 0.1682 E7 0.89 0.0820 
3s) 1.12 0.0589 E8 1.18 0.0112 
E4 1.10 0.1039 E9 0.87 0.1612 
BS 0.92 0.1772 E10 0.91 0.0860 
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Total information lost when specific muscles are used to represent the recitation of Ruqyah can be 


calculated using (15). By measuring the normal distribution between speech, Pyy(sp¢) and EMG,Qy jem), 
the Di, (Pi (spc) HWQu pee) distance calculated for all 10 muscles location is given in Table 3. 


Table 3. Total Information Loss Due when EMG Signal is used to Represent Ruqyah Recitation 


Location De Location Dy, 

El 3.1612 E6 9.2151 
E2 3.9916 E7 7.4867 
E3 4.6364 E8 17.8414 
E4 6.7257 E9 3.3586 
ES 3.0796 EI1O 5.9854 


Observing Table 3, it shows that the E5 generates the least loss of information, followed by El], E9, 
E2, E3, E10, E4, E7, E6 and E8. Tables | to 3 and Figures 4 to 6 generate results for the best muscle 
locations, based on several different information measurements. Referring to the results, we could see that 
certain muscle locations repeatedly become the best muscles to represent the recitation of Ruqyah. 


Table 4. The best 4 Muscle Locations in Monitoring Rugyah Recitation 
Results Ranking Score 


Location 4] #9 #3 HA #5 #6 Total Rank 
El 6 2 9 5 8 9 39 
E2 a, Ss TF 2 & FF 43 , 
B3 a 4 € 6 fF. « 23 8 
EA 4 6 3 4 6 4 | 7 
a 5 1 8 6 10 10 40 3 
E7 9 9 4 9 4 3 38 6 
E8 3 10 | 2 1 1 18 9 
E9 10 4 10 10 7. 8 49 1 
E10 8 8 5 7 5 5 38 5 


Now, each result is ranked from 10 (best) to 1 (worst) in order to identify the overall best muscle 
location. Let us defined respectively results #1 to #6 as the highest mean of TH, the lowest variance of TH, 
the highest value of TH, the smallest difference of ATH, the highest probability density of ATH and the least 
of information loss D,,. The ranking of results #1 and #2 are taken from Table 1, result #3 is from the 
observation of Figure 5, results #4 and #5 are taken from Table 2, and result #6 is taken from Table 3. 
The ranking score of these results for each muscle locations is given in Table 4. 

From Table 4, we can conclude that muscles E9, E2, ES and El are the four best muscles to 
represent the recitation of Ruqyah, based on the highest amount of embedded information, calculated from 
their generated EMG. 

In the previous section, we have shown that E9, E2, ES and El are among the four best muscle 
locations, for generating the highest quality signals with the richest embedded information, while reciting 
Rugyah. All results that have been used in supporting the above finding are based upon the 7H values, 
calculated using (16). Let's investigate visually, two different EMG signals taken from two different location, 
generating two different TH values as shown in Figure 7. 

Both signals are taken from the same subject, while reciting the 4th verse of surah “Al-Baqarah’. 
The first and second EMG signals respectively generate 6.4537 and 4.8890 as their TH values. As shown in 
Figure 7, the EMG signal generating higher 7H is more active. This supports our argument in the previous 
section, that the muscles that generate higher TH are desirable, in order to monitor the Ruqyah recitation. 

Although this study suggests four best muscle locations to assess the quality of Ruqyah's recitation, 
this does not mean, other muscle locations that are discussed throughout this paper, cannot be used for 
modeling a speech. Our experimental setup is purposely designed, so that there are 4-second delays, before 
and after verses of Ruqyah being recited as shown in Figure 8. 
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activities 


Normally, during speech activity, some of the muscles will contract earlier or later, before the 
speech is heard. An example of such muscle, is E6 as shown in Figure 8. In our studies, all calculated TH 
values, exclude the EMG information during the delays. In other words, the four muscle locations which 
were recommended in this study, generates the highest EMG activity when a speech 1s heard. 


5. CONCLUSION 

In this study, we have proposed four best muscle location, which could be used to monitor the 
quality of the recitation of Ruqyah. Three of these locations are on the neck while one on the face. 
This conclusion is the result of analyzing the information content, which 1s embedded in the EMG. All EMG 
signals were modeled using Empirical Mode Decomposition (EMD), in which Hilbert Amplitude Spectrum 
has been selected as the extracted features. Two techniques for measurement of information namely Total 
Hilbert-Huang Spectral Entropy, ATH and Kullback—Leibler Divergence, Dy; were applied on the extracted 
features. ATH measures the amount of information embedded in a signal. Five out of six results obtained in 
this study are based on ATH. Our study indicates that EMG with higher ATH represents a more active muscle 
with a strong contraction. On the other hand, Dx; measures the amount of information loss when an EMG 
from a muscle is used to represent speech. It is easier to monitor these muscles for the intent of assessing the 
quality of the recitation, given its strong EMG amplitude and high muscle contraction. 
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